Efficient failure detection and consensus at extreme-scale systems
نویسندگان
چکیده
<span>Distributed systems and extreme-scale are ubiquitous in recent years have seen throughout academia organizations, business, home, government sectors. Peer-to-peer (P2P) technology is a typical distributed system model that gaining popularity for delivering computing resources services. Distributed try to increase its availability the event of frequent component failures functioning such scenario notoriously difficult. In order identify achieve global agreement (consensus) among failed components, this paper implemented an efficient failure detection consensus algorithm based on fail-stop type process failures. The proposed fault-tolerant occurring before during execution algorithm. works with epidemic gossip protocol, which randomly generated paradigm computation communication both scalable. A simulation information dissemination shows can be achieved. P2P simulator, PeerSim, used implement test results exhibited high scalability at same time detected all status processes maintained Boolean matrix.</span>
منابع مشابه
Epidemic Failure Detection and Consensus for Extreme Parallelism
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum’s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault ...
متن کاملTowards Efficient Data Movement at Extreme Scale
High Performance Computing (HPC) systems are equipped with a massive number of processors, high-bandwidth networks, large capacity fast storage, and specialized parallel software to provide answers to challenging scientific and engineering questions. Over the past decades, data generation capabilities in these domains have grown rapidly due to the emergence of large-scale instruments such as te...
متن کاملCommunication-efficient failure detection and consensus in omission environments
a r t i c l e i n f o a b s t r a c t Failure detectors have been shown to be a very useful mechanism to solve the consensus problem in the crash failure model, for which a number of communication-efficient algorithms have been proposed. In this paper we deal with the definition, implementation and use of communication-efficient failure detectors in the general omission failure model, where pro...
متن کاملFailure Detection and Exclusion via Range Consensus
With the rise of enhanced GNSS services over the next decade (i.e. the modernized GPS, Galileo, GLONASS, and Compass constellations), the number of ranging sources (satellites) available for a positioning will significantly increase to more than double the current value. One can no longer assume that the probability of failure for more than one satellite within a certain timeframe is negligible...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Power Electronics and Drive Systems
سال: 2022
ISSN: ['2722-2578', '2722-256X']
DOI: https://doi.org/10.11591/ijece.v12i5.pp5339-5347